Video action segmentation aims to slice the video into several action segments. Recently, timestamp supervision has received much attention due to lower annotation costs. We find the frames near the boundaries of action segments are in the transition region between two consecutive actions and have unclear semantics, which we call ambiguous intervals. Most existing methods iteratively generate pseudo-labels for all frames in each video to train the segmentation model. However, ambiguous intervals are more likely to be assigned with noisy and incorrect pseudo-labels, which leads to performance degradation. We propose a novel framework to train the model under timestamp supervision including the following two parts. First, pseudo-label ensembling generates pseudo-label sequences with ambiguous intervals, where the frames have no pseudo-labels. Second, iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model. We further introduce a clustering loss, which encourages the features of frames within the same action segment more compact. Extensive experiments show the effectiveness of our method.
translated by 谷歌翻译
Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: https://github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.
translated by 谷歌翻译
Adversarial attacks can easily fool object recognition systems based on deep neural networks (DNNs). Although many defense methods have been proposed in recent years, most of them can still be adaptively evaded. One reason for the weak adversarial robustness may be that DNNs are only supervised by category labels and do not have part-based inductive bias like the recognition process of humans. Inspired by a well-known theory in cognitive psychology -- recognition-by-components, we propose a novel object recognition model ROCK (Recognizing Object by Components with human prior Knowledge). It first segments parts of objects from images, then scores part segmentation results with predefined human prior knowledge, and finally outputs prediction based on the scores. The first stage of ROCK corresponds to the process of decomposing objects into parts in human vision. The second stage corresponds to the decision process of the human brain. ROCK shows better robustness than classical recognition models across various attack settings. These results encourage researchers to rethink the rationality of currently widely-used DNN-based object recognition models and explore the potential of part-based models, once important but recently ignored, for improving robustness.
translated by 谷歌翻译
Designing and analyzing model-based RL (MBRL) algorithms with guaranteed monotonic improvement has been challenging, mainly due to the interdependence between policy optimization and model learning. Existing discrepancy bounds generally ignore the impacts of model shifts, and their corresponding algorithms are prone to degrade performance by drastic model updating. In this work, we first propose a novel and general theoretical scheme for a non-decreasing performance guarantee of MBRL. Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. These discoveries encourage us to formulate a constrained lower-bound optimization problem to permit the monotonicity of MBRL. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns. Motivated by these analyses, we design a simple but effective algorithm CMLO (Constrained Model-shift Lower-bound Optimization), by introducing an event-triggered mechanism that flexibly determines when to update the model. Experiments show that CMLO surpasses other state-of-the-art methods and produces a boost when various policy optimization methods are employed.
translated by 谷歌翻译
很少有学习模型学习人类注释有限,而这种学习范式在各种任务中证明了实用性数据使该模型无法充分探索语义信息。为了解决这个问题,我们将知识蒸馏引入了几个弹出的对象检测学习范式。我们进一步进行了激励实验,该实验表明,在知识蒸馏的过程中,教师模型的经验误差将少数拍物对象检测模型的预测性能(作为学生)退化。为了了解这种现象背后的原因,我们从因果理论的角度重新审视了几个对象检测任务上知识蒸馏的学习范式,并因此发展了一个结构性因果模型。遵循理论指导,我们建议使用基于后门调整的知识蒸馏方法,用于少数拍物检测任务,即Disentangle和Remerge(D&R),以对相应的结构性因果模型进行有条件的因果干预。从理论上讲,我们为后门标准提供了扩展的定义,即一般后门路径,可以在特定情况下扩展后门标准的理论应用边界。从经验上讲,多个基准数据集上的实验表明,D&R可以在几个射击对象检测中产生显着的性能提升。
translated by 谷歌翻译
流行的图神经网络模型在图表学习方面取得了重大进展。但是,在本文中,我们发现了一个不断被忽视的现象:用完整图测试的预训练的图表学习模型的表现不佳,该模型用良好的图表测试。该观察结果表明,图中存在混杂因素,这可能会干扰模型学习语义信息,而当前的图表表示方法并未消除其影响。为了解决这个问题,我们建议强大的因果图表示学习(RCGRL)学习可靠的图形表示,以防止混杂效应。 RCGRL引入了一种主动方法,可以在无条件的力矩限制下生成仪器变量,该方法使图表学习模型能够消除混杂因素,从而捕获与下游预测有因果关系的歧视性信息。我们提供定理和证明,以保证拟议方法的理论有效性。从经验上讲,我们对合成数据集和多个基准数据集进行了广泛的实验。结果表明,与最先进的方法相比,RCGRL实现了更好的预测性能和泛化能力。
translated by 谷歌翻译
机器人社区早已期望在混乱环境中处理物体的能力。但是,大多数作品只是专注于操纵,而不是在混乱的对象中呈现隐藏的语义信息。在这项工作中,我们介绍了在混乱的场景中进行体现探索的场景图,以解决此问题。为了在混乱的情况下验证我们的方法,我们采用操纵问题答案(MQA)任务作为我们的测试基准,该测试基准要求具有体现的机器人具有主动探索能力和视觉和语言的语义理解能力。任务,我们提出了一种模仿学习方法,以生成探索的操作。同时,采用了基于动态场景图的VQA模型来理解操纵器手腕摄像头的一系列RGB帧以及操纵的每一步,以在我们的框架中回答问题。我们提出的框架对于MQA任务有效,代表了混乱的场景中的任务。
translated by 谷歌翻译
图对比度学习已被证明是图形神经网络(GNN)预训练的有效任务。但是,一个关键问题可能会严重阻碍现有作品中的代表权:当前方法创建的积极实例通常会错过图表的关键信息,甚至会错过非法实例(例如分子生成中的非化学意识图)。为了解决此问题,我们建议直接从训练集中的现有图中选择正图实例,该实例最终保持与目标图的合法性和相似性。我们的选择基于某些特定于域的成对相似性测量以及从层次图编码图中的相似性关系的采样。此外,我们开发了一种自适应节点级预训练方法,以动态掩盖节点在图中均匀分布。我们对来自各个域的$ 13 $图形分类和节点分类基准数据集进行了广泛的实验。结果表明,通过我们的策略预先培训的GNN模型可以胜过那些训练有素的从划痕模型以及通过现有方法获得的变体。
translated by 谷歌翻译
已经出现了许多变形金刚的改编,以解决单模式视觉任务,在该任务中,自我发项模块被堆叠以处理图像之类的输入源。直观地,将多种数据馈送到视觉变压器可以提高性能,但是内模式的专注权也可能会稀释,从而可能破坏最终性能。在本文中,我们提出了一种针对基于变压器的视力任务的多模式令牌融合方法(TokenFusion)。为了有效地融合多种方式,TokenFusion动态检测非信息令牌,并用投影和聚合的模式间特征将这些令牌替换为这些令牌。还采用了残留位置对准来实现融合后模式间比对的明确利用。 TokenFusion的设计使变压器能够学习多模式特征之间的相关性,而单模式变压器体系结构基本上保持完整。对各种均质和异构方式进行了广泛的实验,并证明TokenFusion在三个典型的视觉任务中超过了最新方法:多模式图像到图像到图像到图像转换,RGB深度语义分段和3D对象检测3D对象检测点云和图像。我们的代码可从https://github.com/yikaiw/tokenfusion获得。
translated by 谷歌翻译
如今,配备了AI系统的摄像机可以捕获和分析图像以自动检测人员。但是,当在现实世界(即物理对抗示例)中收到故意设计的模式时,AI系统可能会犯错误。先前的作品表明,可以在衣服上打印对抗斑块,以逃避基于DNN的人探测器。但是,当视角(即相机与物体的角度)变化时,这些对抗性示例可能会在攻击成功率中造成灾难性下降。要执行多角度攻击,我们提出了对抗纹理(Advexture)。 advtexture可以用任意形状覆盖衣服,以便穿着这样的衣服的人可以从不同的视角躲避人探测器。我们提出了一种生成方法,称为基于环形作用的可扩展生成攻击(TC-EGA),以用重复的结构来制作advexture。我们用advexure印刷了几块布,然后在物理世界中制作了T恤,裙子和连衣裙。实验表明,这些衣服可以欺骗物理世界中的人探测器。
translated by 谷歌翻译